Identifying Genres of Web Pages

نویسنده

  • Marina Santini
چکیده

In this paper, we present an inferential model for text type and genre identification of web pages, where text types are inferred using a modified form of Bayes’ theorem, and genres are derived using a few simple if-then rules. As the genre system on the web is a complex reality, and web pages are much more unpredictable and individualized than paper documents, we propose this approach as an alternative to unsupervised and supervised techniques. The inferential model allows a classification that can accommodate genres that are not entirely standardized, and is more respectful of the actual nature of a web page, which is mixed, rarely corresponding to an ideal type and often showing a mixture of genres or no genre at all. A proper evaluation of such a model remains an open issue. Mots-clés : genre, typologies textuelles, pages web, modèle déductif-inductif, identification automatique, théorème de Bayes

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An n-gram Based Approach to the Classification of Web Pages by Genre

The extraordinary growth in both the size and popularity of the World Wide Web has created a growing interest not only in identifying Web page genres, but also in using these genres to classify Web pages. The hypothesis of this research is that an n-gram representation of a Web page can be used effectively to automatically classify that Web page by genre. This research involves the development ...

متن کامل

Genres In Formation? An Exploratory Study of Web Pages using Cluster Analysis

The Web is a new, large and heterogeneous community where the interaction among the users and the possibility offered by technology may modify existing genres or create new ones. In fact, most genres being borrowed from the paper world have undergone adjustments when moving on to the Web (for instance, online newspapers and online manuals). Also, there is a family of genres, which have been cre...

متن کامل

Cybergenre: Automatic Identification of Home Pages on the Web

The research reported in this paper is part of a larger project on the automatic classification of web pages by their genres. The long term goal is the incorporation of web page genre into the search process to improve the quality of the search results. In this phase, a neural net classifier was trained to distinguish home pages from non-home pages and to classify those home pages as personal h...

متن کامل

Reproduced and emergent genres of communication on the World-Wide Web

The World-Wide Web is growing quickly and being applied to many new types of communications. As a basis for studying organizational communications, Yates and Orlikowski (1992; Orlikowski and Yates, 1994) proposed using genres. They defined genres as "typified communicative actions characterized by similar substance and form and taken in response to recurrent situations" (Yates and Orlikowski, 1...

متن کامل

Genre Analysis of Bookmarked Web Pages

Purpose – A total of 17 user-compiled collections of webpages, comprising 833 bookmarked links in terms of genre, are studied. The purpose of this paper is to find out whether users tend to bookmark certain web genres more than others. Genre theory helps to make sense of the different pages included in these collections, and to classify them, according to their communicative purpose and salient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006